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Abstract 

The capacity of a quantum channel for transmission of classical infor- 
mation depends in principle on whether product states or entangled states 
are used at the input, and whether product or entangled measurements 
are used at the output. We show that when product measurements are 
used, the capacity of the channel is achieved with product input states, 
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so that entangled inputs do not increase capacity. We show that this re- 
sult continues to hold if sequential measurements are allowed, whereby the 
choice of successive measurements may depend on the results of previous 
measurements. 

We also present a new simplified expression which gives an upper bound 
for the Shannon capacity of a channel, and which bears a striking resem- 
blance to the well-known Holevo bound. 
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1 Introduction 

1.1 Overview 

Bennett and Shor note that there are, in principle, four basic types of channel 
capacities for "classical" communication using quantum signals, i.e., communi- 
cations in which signals are sent using an "alphabet" of pure states of quantum 
systems and decoded using measurements on the (possibly mixed state) signals 
which arrive. The mixed states are the result of noise which is represented by a 
stochastic or completely positive, trace-preserving map $. The four possible ca- 
pacities correspond to using product or entangled states at the input, and using 
product or entangled measurements at the output. These are denoted as follows: 
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Cpp product signals and product measurements 



Cpe product signals and entangled measurements 

Cpp entangled signals and product measurements 

Cee entangled signals and entangled measurements 

In more precise language "using product" means restricting to products and "us- 
ing entangled" means using arbitrary (product or entangled) states or measure- 
ments. Hence, it is evident that Cpp < {Cep,Cpe} < Cee- The main purpose 
of this note is to show that Cpp = Cpp, i.e., that if one is restricted to using 
product measurements, then using entangled inputs does not increase the capac- 
ity. Thus Cpp = Cpp < Cpp < Cee- It is known || |9], [K| that one can have 



strict inequality in Cpp < Cpp for certain non-unital channels. The question of 
whether or not one can have strict inequality in Cpp < Cee is open, although 
numerical evidence pL |23j suggests equality. 



1.2 Notation and Definitions 

To give precise definitions, we use relatively standard notation in which M. = {Ef,} 
denotes a "positive operator valued measurement" (POVM) i.e., i^f, > and 
J2b^ b = I- Let Pj denote a set (or alphabet) of pure state density matrices, Tij 
a discrete probability vector, and p = J^j^jPj- We let £ = {^j,Pj} denote this 
ensemble of input states. Both E b and pj are operators on a Hilbert space H, so 
that the stochastic map $ (representing the noise in the channel) acts on B(TC), 
the algebra of bounded operators on TC. We will write £ = {itj, $(Pj)} for the 
ensemble of output states emerging from the channel. 

We write the dual of $ (or adjoint with respect to the Hilbert-Schmidt inner 
product) as $ so that Tr [$(p) E] = Tr[p$(E)]. The adjoint of a stochastic 
map takes a POVM M = {E b } to another POVM M = {E b } since the trace- 
preserving condition on $ is equivalent to $(/) = /. 

The information content of a noiseless quantum channel with a fixed input 
ensemble and a fixed POVM can be described using the standard Shannon formula 
of classical information theory. 

Definition 1 For a fixed ensemble £ = {n j,Pj} and a POVM A4 = {E^} on a 
Hilbert space 7i, the quantum mutual information is given by 

I%£-M) = 5(Tr[pE b ] ) - ^TTjSC&foEj ), (1) 

j 
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where >S(Tr[pEb]) denotes the Shannon entropy —J2bPb^°&Pb of the probability 
vector with elements pb = Tr[pEb] (and similarly for S^TrfpjEb] ) ). 

The information content of a noisy channel defined by the stochastic map $ 
is obtained from (0) by replacing £ by the output ensemble £ = {ttj, 

Alternatively, since Tr [$(pj) E] = Tr [pj $(E)], we could instead choose to regard 
the "noise" as acting on the POVM, and obtain the capacity from ([I]) by replacing 
M. by M.. Although this viewpoint is atypical, it can be useful, as we will see in 
Section [|. 

Definition 2 For a stochastic map $, an input ensemble £ = {iTj,pj} and a 
POVM M. = {E b }, the quantum information content is given by 

Il(£;M) = I q (£;M) = I\£;M) (2) 
= 5(Tr[$(p)E b ])-^ 7 r J S(Tr[$(p J )E b ]). 

j 

We consider memoryless channels in which multiple uses of the channel are 
described by the n-fold tensor product $ <g> $ ... ® $ acting on the tensor product 
Hilbert space TC (g> Ti. . . . <8> TC which we denote by and 7i® n respectively. 
This allows us to define the 'ultimate' information capacity of the channel as the 
asymptotic rate achievable when entangled inputs and measurements are used. 

Definition 3 The entangled signals/entangled measurements capacity of a quan- 
tum channel is defined as 

C EE (^) = lim - sup lU n (£;M) (3) 

n^oo n e,M 

where the supremum is taken over all possible (product or entangled) signals and 
measurements on 7i® n . 

To define capacity restricted to product measurements, we write A4® n for a 
product POVM of the form {E bl ®E b2 ---& E bn }. 

Definition 4 The entangled signals/product measurements capacity of a quan- 
tum channel is defined as 

C £P ($)= lim - sup 7| 0n (£;.M®"). (4) 

n-+oo n £^M® n 
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Note that the existence of the limits follows from superadditivity of the clas- 
sical capacity. 

The capacities Cpp and Cpe can be similarly defined. We write £® n to denote 
an ensemble of the form {x^...^, pj x <8> • • • <8> Pj n }, where {pj} is a fixed collection 
of states, and {vr^ Jn } is some joint probability distribution. 

Definition 5 The product signals/entangled measurements capacity of a quan- 
tum channel is defined as 

C PE ^)= lim - sup Il® n {£ m ;M). (5) 

Tl-»00 Tl £®n M 



Definition 6 The product signals/product measurements capacity of a quantum 
channel is defined as 

C PP ($)= lim - sup I% n (£® n ;M® n ). (6) 

The additivity of classical information capacity immediately implies the fol- 
lowing result. 

Theorem 7 The product signals/product measurements capacity of a quantum 
channel is given by 

Cpp($) = C Sha „(<£ ) = sup Il{£; M). (7) 

£,M 

which we call the Shannon capacity. 

A far deeper result is that Cpp($) can be re-expressed in terms of the well- 
known Holevo bound 0, |9], [I7|]. This result was proved independently in || and 



22j] , building on earlier work in [[n| and f7|. 



Theorem 8 (Holevo- Schumacher- Westmoreland) The product signals/entangled 
measurements capacity of a quantum channel is given by 



C PE m = C H oiv(<£) = sup (s[®{p)\ -J2^ S MPj)} 



(8) 



where S(P) = — Tr (P logP) denotes the von Neumann entropy of the density 
matrix P. We call this the Holevo capacity of the channel. 
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1.3 Summary of Results 

Our main result, that using entangled inputs with product measurements does 
not increase the capacity of a channel, can be stated as 

Theorem 9 For any stochastic map, Cep{&) = C*shan( < ^ ) )- 

There is another implementation of product measurements which has the po- 
tential for a greater capacity. It involves a sequence of POVM's on the product 
spaces H® n , whereby the POVM for the second measurement depends on the re- 
sult of the first measurement, the POVM for the third measurement depends on 
the results of the first two measurements, and so on. The idea is that "Bob" can 
choose his successive POVM's based on the results of previous measurements. We 
write C^°p d ($) for the maximum asymptotic rate achievable for such a sequence 
of conditional POVM's, with entangled inputs allowed. (The precise definition 
of a conditional POVM is postponed to Section f| and the capacity is given by 
(j34D .) Our next result shows that using such conditional POVM's with entangled 
inputs again does not increase the channel capacity. 

Theorem 10 For any stochastic map, C^ p d ($) = C Shtm (Q). 

Theorem 110 was proved independently (and simultaneously), using different meth- 



ods, by P. Shor p0|| , and also later proved independently by A. Holevo ||12|| . A 
conditional POVM is not the most general situation involving product measure- 
ments, which would be a POVM in which each measurement can be written as 
a tensor product. Except for the obvious bounds, we know of no results for the 
capacity associated with such POVM's. 

The capacity of a classical channel can be written as the (suitably restricted) 
supremum of the classical mutual information. We extend this observation to 
the quantum case, using a tensor product formulation whereby the first two (and 
possibly all four) of these basic capacities are realized using mutual information 
in the form of the relative entropy of a density matrix and the product of its 
reduced density matrices. This leads to the following upper bound. 

Theorem 11 For any stochastic map, 



Cep{<5>) < sup 



%)-5]S^i(£^)+S(r) 
b 

where r h = Tr E b ] = Tr [p$(E b )]. 
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We call the quantity on the right Uep, and we conjecture that it is equal to 
Cep, i-e. that equality holds in Theorem |TT] above. We motivate and study 
Uep in Section [2.3| where we show that it can be rewritten in a form similar to 
the Holevo capacity. Combined with Theorem || above, this conjectured equality 
would provide a simplified expression for the Shannon capacity of any channel, 
whereby the sup over both input ensemble and POVM is replaced by a sup over 
one average input state and the POVM. 

Although the proof of Theorem ^ does not depend on our tensor product 
reformulation, we present this material first, in the following section, because we 
feel it gives some useful insights. Section |] is largely pedagogical and provides 
the motivation for our conjectured expression for Cep- Section [] is also primarily 
pedagogical; it introduces the reader to Holevo's C-Q and Q-C channels 0. This 
leads to a short proof of both the well-known Holevo bound and the new bound 
in Theorem 11. Moreover, the additivity of Q-C channels implies Theorem ^|and 
motivates our proof of Theorem [10]. The reader primarily interested in this proof 
can skip directly to Section |L 



2 Capacity from Mutual Information 

2.1 Classical background 

The classical mutual information of two random variables X and Y measures how 
much information they have in common and is given by 

7 c (X;y) = ^K^2/)log^^ (9) 
ff p{x)p(y) 

If X and Y represent the input and output distributions of a channel, then the 
classical Shannon capacity is the supremum of I C {X; Y) taken over all possible 
joint distributions allowed by the channel. 

The Shannon capacity of a quantum channel can also be obtained in this way 
provided that the joint distribution arises from a quantum communication process 
(<&,£, M) as 

p(j, b) = 7TjTr [$( Pj )E b ] = TTjTr [ Pj $(E b )] (10) 

Although the stochastic map $ is usually regarded as noise acting on the signals 
Pj, it is important to recognize that it has another interpretation corresponding 
to the second expression for p(j, b) in ([IT]) above. In the second case, the channel 
transmits signals faithfully, but the "noise" distorts the measurement process by 
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converting the POVM {E b } to a modified POVM {E b = §(E b )} implemented by 
the action of the dual of $. 

In order to make the transition from classical to quantum communication, it 
is sometimes useful to consider a classical probability vector p(x) as the diagonal 
of a matrix P. We can then write the relative entropy 

H(P, Q) = Tr [P log P - P log Q] (11) 

in a form which reduces to the usual classical expression when P and Q are 
diagonal, but is also valid when P and Q are density matrices representing mixed 
quantum states. In this notation @ becomes 

I C (X;Y)=H[P 12 ,P 1 ®P 2 ] (12) 

where Px 2 ,Pi, and P 2 are diagonal matrices with non-zero entries p(x,y),p(x) 
and p(y) respectively. 

2.2 Tensor Product Reformulation 

A reformulation and generalization of mutual information and capacity can be 
made using formal tensor products. It should be emphasized that this is done for 
convenience of notation and is distinct from the tensor products used in describing 
multiple uses of the channel. Let Habqr = C J <8> C M ®7i®7i where j — 1 ... J, 
b = 1 . . . M and H,q = 7{r = 7i is the original Hilbert space on which p and E b 
act. The partial traces then correspond to T4 = T B = Tq = Tr, and 

•//,- iv. 

Let Pabq be the block diagonal matrix with blocks ttj y$(pj) E\, Pj ) and 
Pabq the block diagonal matrix with blocks TXj^TpjQ (E b )yfp~, 

Then P AB = T q P AB q = T q P A bq = Pab and 

Pab is a diagonal matrix with (non-zero) elements p(j, b) = TTjTi [$(pj) E b ], 

Pa = T bc Pabq = T b Pab is a diagonal matrix with elements SijiTj, 

P B = TaqPabq = TaPab is a diagonal matrix with elements S ab r b where 
r b = Tr$(p)Eb = Trp$(Eb) as in Theorem [TTJ. 

It is straightforward to verify that 

C PP = C Sh U^) = sup[S(P b )-S(P ab ) + S(Pa)] (13) 

£,M 



s 



= sup H(P AB , Pa ® Pb) = sup J|(£; A4) 

= sup/ 9 (£;7W) =sup/ 9 (£;.M). (14) 

£,M £,M 

where the last line in fll^D, although redundant is included to emphasize the fact 
that we can suppress the explicit dependence on $ by using either a restricted 
ensemble with pj = or a restricted POVM of the form <&(Eb). 

Note that all the matrices in ( Jl3| ) above are diagonal and could be replaced 
by probability vectors. The quantum character of the channel is hidden in the 
fact that Pab must be the reduced density matrix of a Pabq of the form above 
with quantum blocks. Thus we might have replaced sup^^ above by either 
supp AsQ H(P A b, Pa <£> Pb) or supp AsQ H(P A b, Pa <8> Pb) with the understanding 

that the supremum was to be taken over those Pabq or Pabq with the block 
diagonal form given above. 

We can find a similar expression for the Holevo capacity by noting that 
Paq = TbPabq is a block diagonal matrix with blocks nj^(pj), and 

P Q = TabPabq = T a Paq = 
It is again straightforward to verify that 

CW ee C Ho iv($) = sup [£(P Q ) - S(Paq) + S(P A )} (15) 

£ 

= sup H{P A q,P a ®Pq)- 

£ 

We can interpret this as a classical to quantum mutual information between 
the classical probability distribution 7Tj of the input alphabet and the average 
quantum distribution <&(p) which emerges from the channel. 

We conclude by observing that the entanglement assisted capacity of 0] can 
be written in a similar way as 

sup {H(p QR , p Q <g> PR ) : P qr = ><tf|)} (16) 

with f 6 C 2 ® C 2 . This differs slightly from eq. (4) of ||. However, because 
is pure, their S(p) = S[T 2 (\V)(y\)] = S , [T 1 (|*)<*|)] = S(pp) in our 
notation. Thus the expression in ( |i~6D above is equivalent to eq. (4) of This 
is a form of quantum to quantum mutual information between the subsystems 
of an entangled pair, one of which is subjected to noise via transmission through 
the channel. 
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We also expect that the capacity Cee can be expressed as a (different) quan- 
tum to quantum mutual information. Unfortunately the precise form has eluded 
us. This approach does, however, lead in a natural way to a new expression 
related to Cep- 

2.3 Proposed expression for Cep 

To motivate our new candidate for Cep-, we let Per be the block diagonal matrix 
with blocks y / p$(^) v / p. Then 

Pb = TrPbr is a diagonal matrix with elements r&. 

Pr = T B P BR = p 

and define 



sup [S(P R ) + S(P B ) - S(P BR )} 

M,p 

sup H{P BR ,P R ®P B ) 



(17) 



sup 

M,p 



sup 

r 6,76 



s (7) - X^^ 76 ) 



where 7& = ^■ v /p$(i?6)y / p and 7 = ^ 6 wyb = P- The last form (|18|), looks like the 
Holevo capacity with the input ensemble £ = {TCj,Pj} replaced by a new "output 
measurement ensemble" {^,75}. How can we characterize this ensemble? Using 
Kraus operators we can write $(p) = J2k A k pA k , where ^ fc A k A\ = L It follows 
that 76 = J2k Bk-EbBk with B k = A\yfp. Hence % is a density matrix in the 
range of a completely positive map which, rather than being trace-preserving or 
unital, satisfies ^2 k E> k Bl = $(p). If we define T p (P) = ^/~p$(P)^/~p we can write 



sup [ S[T P (I)] 

P,M 



Y,nS[T p {T^E h )\ 



(19) 



A different characterization is given in the next section as a condition on P BR . 

We can interpret ( |TTD as a quantum to classical mutual information between 
the average input p and the classical probability vector r b associated with the 
correspondingly averaged output measurements Tr [p$(E b )]. 
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We conjecture that Uep = Cep although we can only show U E p > Cep, 
which is proved in the next section. Note if <3> is the completely noisy channel 
which maps every density matrix to the identity, then Per = Pb <S> Pr so that 
H(Pbr, Pb <8> Pr) = as expected. This also holds if p is a one-dimensional 
projection. 

2.4 Optimization constraints 

We can rewrite all of these expressions for capacity as the suitably constrained 
supremum of an "Input-Output" mutual information, H(pzo,Pi® Po), i-e., 

sup {H(pxo, Pi <S> Po) '■ Pio is a density matrix in Xj } (20) 

where the subset Xjq lies in Aj®Ao and the algebra A is either C nxn or D n , the 
algebra of diagonal n x n matrices. We will let Q = {E : < E < 1} denote the 
set of positive semi-definite operators less than the identity, V the set of density 
matrices, and < V the set of positive semi-definite matrices with trace < 1, i.e., 
the set of matrices \P where P is a density matrix and < A < 1. 

C PP : X IO = [pab = Tr Q p ABQ : pX<f Pabq Pkq e D n ® D n ® ${Q) } . 

In the case of maps on C 2x2 we expect this to be a subset of 
D 2 ® D 2 although, in principle, it could be a subset of D 4 <g> D 4 . 

C PE : X IO = [paq : Paq £D»® $(< P)}. 

Uep ■■ X IO = {p BR : P B 1/2 Pbr Pb^ e D" ® 6(0)} 

: We know only that X JO C C" xri ® C riX ". 

In order to conclude that these expressions are equivalent to those given previ- 
ously, we need to verify that when pxo is in the indicated set, one can always find 
a corresponding ensemble £ and/or POVM M.. The block diagonal conditions 
implicit in the notation above and the fact that $ and $ are trace-preserving and 
identity preserving respectively, makes this quite straightforward. 

When n = 2, we can describe Q explictly by writing E = w I + w-er where 
a = (cr x ,ay,a z ) denotes the formal vector of Pauli matrices and w in R 3 . Then 
< E < I if and only if |w| < min{w , 1 — w } so that 

Q = lyE = w I + w-er : |w| < min{w , 1 — u>o}|- 

w e[o,i] 
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3 Bounds via Q-C Channels 



Holevo |TIJ introduced an extremely useful family of stochastic maps of the form 

n(p) = J2 Rk Tr (Px k ) (21) 

k 

where R k is a family of density matrices, X k is a POVM. He also distinguished 
two important subclasses of these channels 

VLqq Quantum-classical channels in which R k = \ek)( e k\ so that each density 
matrix is a one- dimensional projection from an orthonormal basis {e^}. 

flcQ Classical-quantum channels in which X k = |e^) (e^l so that the POVM is a 
partition of unity arising from an orthonormal basis {e^}. 

Holevo || showed that the quantum capacity of such channels is additive, i.e., 

C pe ($qc ® $qc • • • ® <&qc) = Cpe(<$>§c) =nC PE ($Qc) 

and similarly for Cpe{&cq) = n CpE{$co)- I* 1 the nex t section, we use Holevo's 
strategy for proving additivity for &cq to prove Theorem |H]. 

We now show that both the celebrated "Holevo bound" Cpp($) < Cpp(<l>) 
and the new bound Cpp($) < £/pp($) follow easily from the monotonicity of 
relative entropy under Qqc channels. Our strategy is similar to one used earlier 



by Yuen and Ozawa |25 



In the first case, we let Qqb be a Q-C map of the form (|21| ) with Xb = and 
Rb = |e 6 )(e 6 |. Then 

h{p ab ,p a ®p b ) = Hin QB (p AQ ),n QB (p A ®p Q )] 

< H(P AQi P a ®Pq) (22) 

where P A q and P AB are as in Section |] and we have suppressed the identity in 
/ ® flQB- Taking the supremum over £ yields Cpp($) < Cpp($). 

For the new bound, let VL BA be a Q-C map of the form (pID with Xj = 
i\jP~ x l 2 Pj p -1 / 2 and Rj = \ej)(ej\, so that ^Ira(Pbr) = Pab- Then 

H(P AB ,Pa®P B ) = H[n RA (PBR),ttRA(PB®P R )] 

< H(P BR ,P B ®P R ) 
from which it follows that Cpp($) < U E p{<&)- 
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Remark: It may appear that the argument in (|22[) above yields a simple proof 
of the Holevo bound without using the strong subadditivity (SSA) of relative 
entropy [15 1 as in [21j]. However, Lindblad |16| 



made the useful observation that 



in [^]. However, Lindblad 
any stochastic map can be represented as the partial trace after interaction with 
an auxiliary system, i.e., $(-P) = T b [UabP ® EbU\b\ ^ n ^ct, he used this rep- 
resentation to obtain monotonicity as a corollary of SSA. Thus, the arguments 
used to obtain the Holevo bound via monotonicity (as above or in [J25| ) and via 
SSA (as in |H]]) are essentially equivalent. In the latter approach, an auxiliary 
system is added explicitly and then discarded; in the former, this is done implic- 
itly via Lindblad's representation theorem. Further discussion of the history of 
the closely connected properties of SSA, monotonicity of relative entropy and the 



joint convexity of relative entropy is given in [18, 19, 24 



4 Proof of Additivity Using Q-C Channels 

Theorem |9] can be obtained from Holevo's result |TTJ that Ch i v (^qc) is additive, 
i.e., if T is a Q-C channel of the form following fl2"T|), then CHoiv(r) is additive. 
To show how this follows, we define 

r^(P) = ^|e 6 )(e 6 |Tr[P$(E b )]. (23) 

b 

Then T^, M (P) is a Q-C channel with X n = Q(E b ). Moreover, sup £ I%(E;M.) = 
CHoiv(r<i>,Ai), and the additivity of Cu \^^,m) implies sup^ i|® n (£; M® n ) = 
CHoiv(r|%) = ^CHoivd^Ai)- Then Theorem || follows from 

Cho1v($) = sup I%{S\M) = supC Ho iv(r$,x). 

e,M m 

In order to prove Theorem [OJ we will need to extend Holevo's result. Our 
extension, which we present below, follows Holevo's strategy [[II]] with the identity 
( ^7|) replacing subadditivity. This also provides a self-contained proof of Theorem 
since a product measurement is a special case of a conditional measurement. 

First consider a product channel with Hilbert space Hi®Ti,2 and noise operator 
$i ® $ 2 - Let £12 = {vrj, pj} be an ensemble of possibly entangled input states on 
Hi ® H 2 . Let Mi = {E b } denote the POVM on Hi which implements the first 
measurement, and for each b let M.2{b) = {E^} denote the POVM on H.2 which 
implements the second measurement. We then define a joint POVM 12 on Tii ® 
7i 2 , namely {E b ® E^}. Note that although each element of Aiu is a product, 
the joint measurement need not be the product of independent measurements 
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Mi <8> M 2 - This is the result of the fact that the second measurement may be 
conditioned on the results of the first. Nevertheless, it is easy to verify that M12 
is a POVM since 

b,c b c b 

The information content of a channel using such conditioned measurements is 

ilW&a; M12) = I q (S 12 ;M 12 ) = I q (£u;M 12 ) (24) 

where Mi,M2(b) and M12 denote the POVM's in which E b is replaced by 
Fb = &i(Eb) and Ec is replaced by F c = $ 2 (.Ec^), and we have used the 
notation defined in ([!]) and (0). Because we are interested in studying the ca- 
pacity for a fixed set of POVM's, we use the form I q {£\2] M.12) and proceed as 
if we were considering a noiseless channel with a restricted POVM of the above 
form. Although this viewpoint is useful, it is not essential. The argument would 
work equally well if we explicitly included the stochastic maps or used the form 
I q (£\2, M.12) and defined reduced density matrices using partial traces acting on, 
e.g., ($1 ® $ 2 )(Pj)- 

For any input ensemble S12 we now define a pair of associated input ensembles 
on 7ii and 7i 2 respectively. For this purpose it is useful to let Tj denote the partial 
trace over TCj. First, let pf 1 = T 2 [pj] be the indicated reduced density matrix 

and Si = {7ij,p^}. This is our ensemble on Hi. Second, for each j and 6, define 
a state on TC2 by 

p { 3= P {b\j)- x T 1 [{p j ){F b ®I% (25) 

where p(b\j) = Tr [pj(Fb ® I)]. Then the corresponding input ensemble on T~C 2 is 

(2) 

S 2 (b) = {p(j\b), p)l}, where p(j\b) = p(b\j)ir j / 'p(b) and 



^7T jPj )(F b ®I) 



(26) 



We claim that 

I q (Si 2 ;Mi2) = I q (Si;Mi) + $>(&) I q [S2(b);M 2 (b)}. (27) 

b 

Since 

I q [S2(b);M 2 (b)} = Il 2 [S 2 (b);M2(b)} < C Shan ($ 2 ) (28) 
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it follows immediately from (|27|) that 

I q {S l2] Mi2) < / 9 (^i;^i) + ^p(6)C Shan ($ 2 ) 

b 

= I g (£i;Mx) + C Shan (^ 2 ). (29) 

Taking the supremum over channels of this type, which we now emphasize by 
writing .M 12 nd 5 gives 

sup Il 1 ^ 2 (£i2-,MtT d ) = sup P(£ 12 ;MtT d ) 

£12 M? 2 nd £i2,M c 1 r d 

< Csha„($l)+C S han($2). (30) 

However by restricting to product ensembles and product POVM's in the sup on 
the left side of (|30|) , and using additivity of the classical capacity (0), we deduce 

SUP Il^{£l2\ M? 2 nd ) > C Sh an($l) + C Sha n($ 2 ). (31) 

£i2,MiT d 

Hence we have equality in (|30|). 

Now consider the n-fold product channel $1 ® - • • ® $„,. Let A4 cond be a 
conditional POVM on Tii <g> ■ ■ ■ <g> Ti n . By assumption, every operator in this 
POVM has the form E b ® where is a conditional POVM A/" cond on 

Ti 1 (g) • • • <g) Tin-i, and for each 6, constitute a POVM on 7^ n . Also, for any 
input ensemble £ on Hi <8> • • • ® W n > let £' be the ensemble of reduced density 
matrices on H,\ <g> • • • ® H. n -i- Then (RH) implies 



SU P ^0!®$2®-(8l#™(^' 

5 jVI cond 



cond\ 



< sup 4 1 ^ 2 ^...^ ri _ 1 (^;Ar cond ) + Cs han ($n). (32) 

£' ,AA cond 



Iterating ([52]) gives 



snp -M c ° nd ) < £ Csh«(^). (33) 

£ A/fcond 



fc=l 

The definition of conditional capacity is 

1 



C^ d ($)=hm - sup I% n {£M COIid ). (34) 
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Hence if we let = (k = 1,2 . . .) it follows immediately from (|33|) that 

CJ£ d (<&) < C Shan ($). (35) 

Since the capacity of the product channel is never less than the sum of the channel 
capacities, i.e, Cf ) p d ($) > Cshan($) we must have equality in (|35|) which proves 
Theorem |TI]. 

It is worth noting that our argument can be used to prove a somewhat stronger 
result, namely the additivity of sup,? I%{£ ; A4 cond ) for any fixed conditional mea- 
surement Ai cond . 

All that remains is to verify ([27|) which is, except for notation, equivalent to 
the following result from classical information theory: for any random variables 
J,B,C 

I C (J; B, C) = I C (J; B) + I C (J; C\B). (36) 

Although the derivation of ( |3T)| ) is quite elementary (see for example |5], (17]), for 
completeness we include it in Appendix A, where we also show its equivalence to 

(EZP- 

Acknowledgment: It is a pleasure to thank C.H. Bennett, J. A. Smolin and B.M. 
Terhal for useful discussions which helped to crystallize our understanding of this 



problem, and P. Shor for communicating his independent proof of Theorem [10 
We are also grateful to the referee for an extremely careful reading of the previous 
version. 



A Appendix: A Useful Information Identity. 

First we relate (p7| ) to an expression involving classical mutual information. The 
input alphabet of the product channel can be described by a classical discrete 
random variable J, whose distribution is given by the input ensemble £ 12 , that is 
P(J=j) = 7i j. The output alphabet can be described similarly by a pair of random 
variables B, C, corresponding to the joint POVM Aiu- The joint distribution of 
J, B, C is given by application of the formula (0), namely 

P(J=j, B=b, C=c) = p{j, b, c) = 7r,Tr[ (ft) F b ® ]. (37) 

Applying the definitions in ([!]), @ and (0) gives directly 

I C (J;B,C) = P(£ 12 ;M 12 ). (38) 
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Furthermore, by summing over c in fl37|) and conditioning on j, it follows that 

p(b\j) = Tr[( Pj ) F b ®I] = Tr[( Pj )« Fb]- (39) 

Comparing with the definition of the ensemble £\, it follows that 

I C (J;B) = F(£ 1 ;M 1 ). (40) 

For the second term on the right side of fl36"D, recall that by definition 

I\J; C | B) = J^Pi^iJ; C I {B = b}). (41) 

b 

Also 

P(c|j ' 6) = 7^ = Tr[(ab)(2)F ^ ] (42) 
and = p{j,b)/p(b) — p(b\j)irj/p(b), so therefore 

I q {J;C\ {B = b}) = I«{£ 2 (b);M 2 (b)). (43) 



Hence equations (P7|) and (|3q) are identical. 

As noted before, ( |3T)| ) is a standard result in information theory. We include 
its derivation for completeness. The left side can be rewritten as 

I(J; B, C) = H{J) + H(B, C) - H(J, B, C) (44) 

where H(X) is the classical entropy of the random variable X. The two terms 
on the right side are respectively 

I(J; B) = H{J) + H(B) — H(J, B) (45) 

I(J; C\B) = H{J\B) + H(C\B) - H{J, C\B). (46) 
Further, for any random variables X and Y, 

H(X\Y) = H(X,Y)-H{Y), (47) 
and therefore ([5]) can be written as 

I(J; C\B) = H(J, B) - H(B) + H(C, B) - H(B) - H(J, C, B) + H(B). (48) 
Adding ( f4"o] ) and fl4"8] ) gives the right side of fl4"4|), which proves the result. 
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